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ABSTRACT 



The Longitudinal Evaluation of School Change and Performance 
(LESCP) analyses were organized around policies embodied in Title I of the 
Elementary and Secondary Education Act. This study measured changes in 
student performance in seventy-one high-poverty schools, which included many 
schools that had implemented standards-based reform policies. The study 
focused on: (1) analyzing student outcomes in reading and mathematics 

associated with specific practices in classroom curriculum and instruction; 
and (2) broadening its lens to learn about policy conditions, especially with 
regard to standards-based reform, under which the potentially effective 
classroom practices were likely to flourish. The report used data from 
standardized achievement tests, teacher surveys, district administrator and 
principal surveys, focus groups of school staff and parents, classroom 
observations, state and district policy statements, and student records. A 
particular asset of the study was its longitudinal database on more than 
1,000 individual students, with student performance data linked to teachers' 
survey responses. This enabled researchers to pursue a detailed study of 
student performance in relation to the specific instructional conditions 
experienced by those students over time, with statistical controls for 
important background variables such as poverty. While confirming the harmful 
effects of poverty on the performance of students and schools, the findings 
point to some instructional practices that may boost performance in reading 
and mathematics. They also show an association between some of those 
practices and the standards-based reform approach of the current Title I 
program. The report contains four technical appendices. (SM) 
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INTRODUCTION: STUDY PURPOSES, DESIGN, AND SAMPLE CHARACTERISTICS 



The Longitudinal Evaluation of School Change and Performance (LESCP) design and 
analyses were organized around the policies embodied in Title I of the Elementary and Secondary 
Education Act, as amended in 1994. Since its original enactment in 1965, Title I has been intended to 
improve the learning of children in high-poverty schools, with a particular focus on those children whose 
previous achievement has been low. Therefore, this study measured changes in student performance in a 
sample of Title I schools, and its analyses included a special look at those students with initially low 
achievement. The study was in the tradition of past work addressing school practices and policies that 
can contribute to higher achievement in Title 1 schools. 1 The study had a dual focus: (1) analyzing the 
student outcomes associated with specific practices in classroom curriculum and instruction; and then (2) 
broadening its lens to learn about the policy conditions — especially with regard to standards-based 
reform — under which the potentially effective classroom practices were likely to flourish. 

The second focus considers the provisions of Title I enacted in 1994 that strongly encourage 
states, school districts, and schools to pursue a standards-based approach to educational improvement. 

. The standards-based approach relies on aligned frameworks of standards, curriculum, student assessment, 
and teacher professional development to set clear goals for student performance and to help organize 
school resources around those goals. It is an approach that several states and some large districts began to 
put in place earlier in the 1990s. Several large federal programs, prominently including Title I, adopted 
the philosophy of standards-based reform in 1994. 

This chapter describes the conceptual model used for the study’s data collection and analysis 
and how the model was implemented for the study. It then describes the variation found in the study’s 
purposive sample with regard to standards-based reform policies and school characteristics. The final 
section of the chapter highlights the major data sources for the study. 



1 See, for example, D.R. Lee, R.A. Carricre, A.H. MacQueen, L.H. Poynor, and M.S. Rogers. (1981). Successful practices in high-poverty 
schools, (Technical Report No. 16) in the Study of the Sustaining Effects of Compensatory Education on Basic Skills. Santa Monica, Calif.: 
System Development Corporation; M.M. Kennedy, B.F. Birman, and R.E. Demaline. (1986). The effectiveness of Chapter 1 services. 
Washington, D.C.: U.S. Department of Education; M.S. Knapp, P.M. Shields, and B.J. Turnbull, (1992). Academic challenge for the children 
of poverty, Washington, D.C.: U.S. Department of Education, 
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The Conceptual Model and How It was Implemented 

The study’s conceptual framework, depicted in Figure 1-1, shows the study’s design for 
tracing the complicated path by which policy might affect student performance. Beginning on the right of 
the framework, Box 4 in Figure 1-1 represents the student-level goal of Title I and other policies, 
improved student achievement. In this study, most of our analyses used the Stanford Achievement Test, 
Ninth Edition (SAT-9) tests of reading and mathematics as measures of student achievement. Students 
took the SAT-9 tests in the third and fourth grades in 1997, the fourth grade in 1998, and the fourth and 
fifth grades in 1999; this permitted us to track the performance gains of individual students over time and 
also the performance of successive cohorts of fourth graders. We discuss the specifics of these measures, 
including pros and cons, in Chapter 2 of the report. For purposes of the conceptual framework, we note 
merely that some of our analyses focused on the scale scores attained by students in a particular grade, 
and others focused on the score gains made by individual students over 2 years (from third grade to fifth 
grade). 



Box 3 in Figure 1-1 represents proximal variables, those that might plausibly exert a direct 
influence on student achievement in reading and mathematics. They include classroom curriculum (what 
was taught) and instruction (how the material was taught). For kindergarten to fifth-grade classrooms in 



Figure 1-1. Conceptual framework 




Context Implementation 
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schools participating in this study, teachers responded to questionnaires about their curriculum and 
instruction in reading and mathematics in all three study years. They also answered questions about their 
beliefs, the professional development they received, and their outreach to parents. In addition to measures 
for individual teachers, Box 3 also includes measures of curriculum, instruction, and instructional 
support — like professional development — averaged at the school level. This is because instructional 
influences on students may come from features of the whole school environment, not just one classroom. 

Demographic conditions, especially individual and school poverty, can be important 
influences on student achievement. Thus, the analyses that explored the connections between Box 3 and 
Box 4 in Figure 1-1 paid attention to control variables such as poverty and school size. 

At the same time that the study viewed the Box 3 variables (in Figure 1-1) as proximal 
inputs to student achievement (Box 4 in Figure 1-1), it also treated them as outcome measures. Logically, 
one would expect curriculum, instruction, and instructional supports to reflect a combination of influences 
that would include policies from the local, state, and federal levels (Boxes lb and lc in Figure 1-1) and 
socioeconomic conditions impinging on the school (Box la in Figure 1-1). These influences would be 
filtered through a school’s implementation choices, which are represented by Box 2 in Figure 1-1 — for 
example, what professional development the school offered, and how vigorously the principal sought to 
impress the importance of outside standards on the teachers. The surveys developed for this study 
permitted us to understand Box 2 from the perspective of all the teachers in the school, using our 
measures of the extent to which teachers had actually participated in professional development or reported 
that they were familiar with standards. While these measures give an indirect window on what the school 
did to promote implementation of outside policies, we would argue that it was an important window, 
showing what messages from the school were received by the teachers. 

Looking across Figure 1-1, readers will notice that policies appear not as direct influences on 
student performance but as influences that are mediated through school-level policy implementation and 
teacher-level practices. This reflects our conviction that a policy enacted in Washington, D.C.; a state 
capital; or a school district’s central office cannot possibly affect student performance unless and until 
schools and teachers do something different. Accordingly, as just described, the LESCP-analysis-built 
models of student performance that tested the possible influence of many variables drawn from Box 3 in 
Figure 1-1. Having done that, we then treated Box 3 variables as outcomes and explored the extent to 
which important Box 3 variables were found in environments of standards-based reform. 
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LESCP Sample 



This study looked in depth at a purposive sample of state and local policy environments 
rather than employing a larger, nationally representative sample of Title I schools. To assess school and 
classroom responses to standards-based reform, the study focused on states and districts that had enacted 
standards-based reform some years earlier. 2 The states varied in their approach to reform — for example, 
some put high-stakes assessment in place early on, while others began their reforms with a process of 
developing content standards. Of the seven states in the sample, five were arguably embarked on some 
version of standards-based reform in 1996 when the sample was drawn. Although the other two states 
were doing less with standards-based reform in 1996, they moved in that direction over the course of the 
study, and one of them moved quite rapidly. This left the study with somewhat less variation at the state 
level than was originally expected. The 18 participating districts presented a similar pattern: none was 
untouched by standards-based reform, although they varied in the alacrity and thoroughness with which 
they enacted each of several kinds of standards-based policies, both in response to state requirements and 
on their own initiative. In short, the LESCP schools were subject to some variation in the kinds of 
policies enacted by their states and districts — but it is important to recognize that all were subject to some 
policy activity in standards, assessment, or accountability. 

To better understand the policy environments that schools were operating in, we used 
documents provided by the districts’ offices to create ratings for each of the 18 districts on several 
indicators of standards-based reform policies in 1998. We focused this analysis at the district level to 
capture both state and district policy. Taking this approach was necessary because our sample 
deliberately included pairs of states and districts that had initially taken different policy stances on 
standards-based reform. The sample included, for example, districts that state officials described as 
reluctant to implement an aggressive standards-based agenda initiated at the state level. It also included 
districts that had independently established their own standards-based framework in the absence of such a 
framework statewide. 

The 71 schools in the sample all received Title I funds, and most had very high levels of 
poverty. Of the 71 schools, 59 were operating schoolwide programs in 1998-99 (up from 58 in 1997-98 



2 The study could not meet its mandate simply by looking at schools’ responses to the provisions of the 1994 law because the timeline for the 
study does not mesh well with the timeline for the law’s implementation. For example. Title I does not require full implementation of 
standards-based accountability for schools until the school year 2000-01, 2 years after LESCP data collection ends. Thus, only in those states 
and districts that had already enacted standards-based reform some years ago — before the Title I provisions were enacted — would it be possible 
to expect widespread, classroom-level effects as early as the LESCP data collection period. 
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and 54 in 1996-97). This reflected, in part, the high levels of poverty in participating schools: 15 schools 
had more than 90 percent of their students living in poverty, 25 schools had between 75 percent and 
90 percent, 21 schools between 50 and 75 percent, and 10 schools had fewer than 50 percent. In all 
schools, the poverty rate was higher than 35 percent. 



Data Sources 

This report is based on three rounds of data gathered in spring 1997, spring 1998, and spring 
1999 from students and school staff. The LESCP study collected repeated measures of students’ 
performance, teachers’ reported behavior and opinions, and the school’s policy environment in 
71 schools. These schools, all of which receive funds under Title I, were nested in a purposively selected 
sample of 18 districts in 7 states. The schools were not statistically representative of high-poverty schools 
in the nation as a whole, in their states, or even in their districts. However, the study provided a rich 
database that permitted the analysis of differences across students, classrooms, schools, and policy 
environments at any one time and also across school years. A summary of the data collected is shown in 
Table 1-1. 



This report draws most heavily on three of the study’s data sources: 



■ The tests administered to students who were in the third grade in 1997, fourth grade in 
1998, and fifth grade in 1999; 

■ Surveys completed by teachers in each year regarding topics that include their 
classroom curriculum and instruction in reading and mathematics, their knowledge 
and instruction with regard to standards-based reform and instruction, their 
professional development over the past 12 months, and their outreach to parents of 
low-achieving students; and 

■ Documents collected from school districts regarding policies related to standards 
based reform. 

Data on schools’ performance on state or local assessments were also collected and used. 
First, we analyzed the relationship between school performance on the state assessments and on the 
SAT-9, relative to the rest of the sample schools in that state. We also looked at the extent to which the 
study’s proximal variables, averaged at the school level, were related to trends over time in schools’ 
performance on their state tests. 
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Table 1-1. Summary of data collected 
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Contents of This Report 



This report essentially works its way from right to left in the conceptual model. The next 
chapter describes the achievement of students in the sample, both the full sample and the smaller group of 
students who were followed over a 2-year period as they moved from third grade through fifth grade. In 
Chapter 3, we present the results of the analysis of influences on student achievement: what instructional 
conditions were associated with higher levels of student performance, either initially or over time. We 
use those findings in Chapter 4 to identify the extent to which students had access to favorable 
instructional conditions — the variables that seemed to matter for achievement — if they lived in poverty or 
if their state or district had enacted particular dimensions of standards-based reform. Conclusions appear 
in Chapter 5. 

This report has four technical appendixes. In Appendix A, we describe in detail the 
statistical approach we used in examining the relationships among student achievement, student and 
school characteristics, and classroom instructional practices. Appendix B consists of tables that show 
how the various measures of instructional practices that we used changed over the 3 years of data 
collection. We performed some secondary analyses of factors related to student test scores. The results 
of these analyses are reported in Appendix C. In Appendix D, we present the results of the reliability 
analyses for the indices we constructed to measure instructional practices. 
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OVERALL STUDENT PERFORMANCE ON TESTS 



The student-level goal of Title I and other policies is improved student achievement (see 
Box 4 in Figure 1-1). A major source of student achievement data for this study was the student testing 
conducted with third and fourth graders in spring 1997, with fourth graders in spring 1998, and with 
fourth and fifth graders in spring 1999. This chapter describes the standardized tests; the students who 
took each of the tests in spring 1997, 1998, and 1999 (as well as the extent and causes of missing data); 
overall results for all the students and for the subset of students who were tested in all 3 years; and a 
comparison of performance at the school level between the standardized tests and states’ own 
assessments. 



This chapter describes student performance as background to an investigation in Chapter 3 
of the relationship between the proximal variables associated with classroom and school instructional 
practices and student performance. The data highlight the fact that, on average, students in the 
Longitudinal Evaluation of School Change and Performance (LESCP) schools underperform in reading 
and in mathematics in the third grade when compared with national norms and that, on average, do not 
close the gap by the fifth grade. The data also show strong correlations between both student-level and 
school-level poverty and student achievement. This sets the context for the Chapter 3 analyses that seek 
to identify classroom and school-level practices that work to overcome the effects of poverty and poor 
performance in the early grades. 



Standardized Tests 

The study administered norm-referenced achievement tests in reading and mathematics, the 
Stanford Achievement Test, Ninth Edition (SAT-9), to participating students. According to the publisher, 
the closed-ended mathematics test aligns with the National Council of Teachers of Mathematics standards 
in effect during the LESCP field period, 1 and one section of the closed-ended reading test (the reading 



1 Stanford 9 Special Report: Mathematics. (1997). San Antonio, Tex.: Harcourt Educational Measurement. 
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comprehension subtest) aligns with the National Assessment of Educational Progress. 2 Separate scores 
were obtained for each of the four tests in spring 1997, 1998, and 1999: 

■ Overall closed-ended reading; 

■ Open-ended reading; 

■ Overall closed-ended mathematics; and 

■ Open-ended mathematics. 

The closed-ended reading test is composed of two subtests, vocabulary and comprehension, 
at the grades administered in the LESCP. The vocabulary subtest assesses vocabulary knowledge and 
skills with synonyms, context clues, and multiple word meanings. The reading comprehension subtest 
uses a reading selection followed by multiple choice questions to measure modes of comprehension 
(initial understanding, interpretation, critical analysis, and process strategies) within the framework of 
recreational, textual, and functional reading. The open-ended reading test contains a narrative reading 
selection in the recreational reading content cluster followed by nine open-ended questions that measure 
initial understanding, interpretation, and critical analysis. 

The closed-ended mathematics test is composed of problem-solving and procedures subtests. 
Five processes are assessed in the problem-solving subtest: problem solving, reasoning, communication, 
connections, and thinking skills. Concepts of whole numbers, number sense and numeration, geometry 
and spatial sense, measurement, statistics and probability, fraction and decimal concepts, patterns and 
relationships, estimation, and problem-solving strategies are measured. The procedures subtest covers 
number facts, computation using symbolic notation, computation in context, rounding, and thinking skills. 
The open-ended mathematics assessment presents nine questions or tasks around a single theme. Ability 
to communicate and reason mathematically and to apply problem-solving strategies are assessed. The 
content clusters for the open-ended mathematics test are number concepts, patterns and relationships, and 
concepts of space and shape. 

The number of students for whom we have test scores varies by the test because not every 
district had its students take each component test. Both the mathematics and reading open-ended tests 
included all districts in the LESCP study. However, one district did not participate in one component of 



Stanford 9 Special Report: Reading. (1997). San Antonio, Tex.: Harcourt Brace Educational Measurement. 
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the closed-ended reading test, and another district’s scores in closed-ended reading were converted from 
SAT-8 scores to SAT-9 scores using equating methods suggested by the test publisher. 



Available LESCP Test Scores 



In this report, we analyze LESCP test scores from data collected during spring 1997, 1998, 
and 1999. Table 2-1 shows the basic sources of data studied here. We had scores for the third and fourth 
grades in 1997, the fourth grade in 1998, and the fourth and fifth grades in 1999. We paid particular 
attention in the analysis to the cohort of students who were third graders in spring 1997. Because we had 
repeated measurements on many of these students, we could measure score growth with a reliable 
baseline score. 

Table 2-1. Grades tested, by year of data collection 



Year of data collection 


1997 


1998 


1999 


Third grade 


No testing conducted 


No testing conducted 


Fourth grade 


Fourth grade 


Fourth grade 


No testing conducted 


No testing conducted 


Fifth grade 



Table 2-2 shows the total number of third-, fourth-, and fifth-grade LESCP students tested 
for each of the four tests in spring 1997, 1998, and 1999. The minimum number of students for any test, 
grade, and year is 2,567. This is an appreciable sample and should allow us to make reliable 
conclusions. 3 



Table 2-2. LESCP sample sizes 



Test 


Third grade 


Fourth grade 


Fifth grade 


1997 


1997 


1998 


1999 


1999 


Reading closed-ended 


2,813 


2,692 


2,567 


3,213 


3,311 


Reading open-ended 


3,646 


3,535 


3,438 


3,503 


3,328 


Mathematics closed-ended 


3,226 


3,073 


2,987 


3,052 


2,871 


Mathematics open-ended 


3,723 


3,503 


3,400 


3,455 


3,326 



3 By comparison, the national percentile and mean estimates developed by the SAT-9 publisher are based on samples size of 4,000 to 5,000, 
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Table 2-3 documents the reasons why some students did not take the closed-ended tests. 
Similar percentages were found for nontest takers for the open-ended tests. In 1997 and 1998, roughly 
10 percent of the parents exercised their privilege to excuse their children from participating in the study. 
This percentage increased to 14 percent in 1999. Approximately 5 percent of the students enrolled in the 
appropriate grades in the LESCP sample of schools were excused by the school from taking the test 
because of their determination that the students’ limited English or disability made test taking 
inappropriate. Schools were asked to use the criteria they used for state or districtwide testing to 
determine who should be excused for either of these reasons. Another 3 percent were absent the day of 
the test, and the test scores for yet another 3 percent or 4 percent were not available for other reasons. 



Table 2-3. Test taking rates for each year of the study 



Closed-ended reading Closed-ended mathematics 



1997 


1998 


1999 


1997 


1998 


1999 


Took the test 


77% 


80% 


11 % 


78% 


79% 


76% 


Parent refused 


9% 


10% 


14% 


9% 


10% 


14% 


Limited-English proficiency 
(LEP) or disabled 


6% 


4% 


5% 


6% 


4% 


5% 


Absent day of test 


3% 


3% 


3% 


3% 


3% 


3% 


Other 


4% 


3% 


2% 


3% 


4% 


3% 


TOTAL 


100% 


100% 


100% 


100% 


100% 


100% 



In addition to focusing our analyses on the cohort of students who were third graders in 
spring 1997, we identified a subset of these students for further analysis. These students, who were tested 
in all 3 years, were called the longitudinal sample. In contrast to the population of all students in the 
cohort, this group was less mobile and may have enjoyed other advantages. We discuss the 
characteristics of these students in Section 2.4. 



Cross-sectional Analyses 

We compared the performance of LESCP students with national and urban reference groups 
and with proficiency levels identified by the test publisher. On average, students in the LESCP sample of 
schools scored below national norms and urban norms in all years and grades tested. Table 2-4 shows 
cross-sectional data on test performance for the entire LESCP sample for 1997, 1998, and 1999. The data 
are shown in several forms: overall mean scores, the national percentile and grade-equivalent that these 
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Table 2-4. LESCP sample scores on the SAT-9 tests 
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means represent, and the percentage of LESCP test takers who performed at particular competency levels 
on each test in each year. These levels are described as corresponding to the kinds of performance levels 
that Title I encourages for state assessment data (e.g., “excellent,” “proficient,” and the like) and set by 
the SAT-9 test publisher. (See Technical Data Report for the SAT-9, 1997, Harcourt Brace & Company). 



For comparison, Table 2-5 shows the national and urban norms by test. National and urban 
norms were taken from the SAT series (1996) based on a representative sample of student scores in spring 
1995. Because urban means were not available for the open-ended test, urban medians were used. On all 
tests, the LESCP students fell below the national norms by 4 points to 23 points. 4 Additionally, the 
LESCP students scored below the urban norm on all tests. 



Table 2-5. National and urban norms for SAT-9 





Third 


grade 


Fourth grade 


Fifth 


grade 




National 


Urban 


National 


Urban 


National 


Urban 


Test 


mean 


median 


mean 


median 


mean 


median 


Reading closed-ended 


614 


607 


637 


634 


654 


647 


Reading open-ended 


586 


579 


606 


609 


629 


624 


Mathematics closed-ended 


600 


593 


624 


624 


646 


639 


Mathematics open-ended 


602 


590 


612 


609 


626 


621 



On three of the four tests, the means and distributions of test scores for the LESCP sample of 
fourth graders remained essentially indistinguishable from year to year. On the open-ended reading test, 
where there was a statistically significant rise in 1998, there was a subsequent decline in 1999. 

We note that the cross-sectional results for fourth graders were obtained on three different 
fourth-grade classes in the LESCP schools. In the next section, we analyze the change within the cohort 
of students who were third graders in spring 1997. In this group, we can more accurately determine 
whether any statistically significant changes were due to the educational experiences of participating 
students during the fourth and fifth grades. 



4 The conclusions based on the differences from national and urban norms are valid if the national and urban scores have not changed 
substantially since the norming year, 1995. 
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Longitudinal Sample 



For the longitudinal sample, we had a reliable baseline score and thus we could accurately 
assess the score gain made between spring 1997 and spring 1999. In comparison to using all test takers 
for this analysis, the use of the longitudinal sample for analysis has the following two disadvantages: it 
reduces the sample size; and it limits the generalizability of the conclusions to those students who spent 
third, fourth, and fifth grade at the same school. As shown below, the students in the longitudinal sample 
scored higher on standardized tests, on average, than did all students in a grade. However, the advantage 
of using the longitudinal sample is that it allows us to conduct a longitudinal study; to distinguish changes 
over time within students from differences among students in their baseline levels. We believe that the 
advantages of basing conclusions on the longitudinal sample outweigh the disadvantages. 

LESCP students were tested on grade. If a student progressed from third grade to fourth 
grade to fifth grade, we required that the student take the third-grade form of the SAT-9 test in 1997, the 
fourth-grade form in 1998, and the fifth-grade form in 1999, to be included in the longitudinal sample. 



Table 2-6 shows the number of students in the longitudinal sample and the mean scores for 
this group for the four tests taken in spring 1997, 1998, and 1999. The ratios of longitudinal test takers to 
third-grade test takers in 1997 ranged from 42 percent in closed-ended mathematics to 50 percent in 
closed-ended reading. That is, between 40 percent and 50 percent of those third graders tested in 1997 
were tested again as fourth graders in 1998 and fifth graders in 1999. 



Table 2-6. Sample size and mean scores for LESCP longitudinal sample 



Test 


Sample size 


Mean 1997 


Mean 1998 


Mean 1999 


Reading closed-ended 


1,401 


607 


628 


646 


Reading open-ended 


1,656 


581 


607 


612 


Mathematics closed-ended 


1,358 


597 


621 


642 


Mathematics open-ended 


1,642 


586 


593 


617 



Table 2-7 shows the difference in mean scores between the LESCP longitudinal sample and 
all the LESCP students in the cohort. As noted above, longitudinal students scored higher than the other 
test takers in all four tests in all 3 years. The difference in means ranged from 3 to 7 points. 



Table 2-7. Difference in mean scores: LESCP longitudinal sample minus all LESCP test takers 



Test 


1997 


1998 


1999 


Reading closed-ended 


5 


7 


6 


Reading open-ended 


6 


5 


6 


Mathematics closed-ended 


5 


7 


6 


Mathematics open-ended 


4 


3 


5 



A pictorial display of the relationships among average closed-ended reading scores for all 
LESCP students, longitudinal LESCP students, national norms, and urban norms is shown in Figure 2-1. 
Scores for the other tests show essentially similar patterns. 



Figure 2-1. LESCP scores relative to national and urban norms for closed-ended reading 




Third grade (1997) Fourth grade (1998) Fifth grade (1999) 



□ Full LESCP 

B Longitudinal LESCP 

□ National norms 
B Urban norms 



The LESCP longitudinal sample students stayed at the same level of performance relative to 
national norms between third and fifth grade on both closed-ended tests. That is, the number of points 
gained by the longitudinal sample was similar to the difference in points between the third and fifth grade 
norm groups. 5 For example, the increase in score between the third and fifth grades for the longitudinal 
sample in closed-ended reading was 39 points, while the difference on medial national norms for the third 
and fifth grades was 40 points. The only difference between the national norm group and the longitudinal 
sample was found on the open-ended reading test. 6 On this test, the longitudinal sample gained 12 points 
less than the norm group. 



3 Note that the test publishers did not follow a group of students over time and track their progress as part of their norming procedure. All norm 
groups were tested at the same time; norm group comparisons across grade levels may not accurately estimate normal growth over time. 

6 The open-ended tests were normed with a smaller sample than the closed-ended tests, this may account for these differences. 
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Poverty 7 

Year by year, student performance revealed the disadvantage of attending a high-poverty 
school or experiencing family poverty. 8 In both reading and mathematics, the schools serving higher 
proportions of poor students started out with lower achievement (as measured by third-grade score) and 
failed to close the gap over time — although at least it can be said that the gap did not widen. Figures 2-2 
and 2-3 display the scores on the closed-ended tests of reading and mathematics, by year and level of 
school poverty, for all LESCP test takers in third grade in 1997, fourth grade in 1998, and fifth grade in 
1999. The schools are divided into four categories by percentage of students eligible for free or reduced- 
price lunch: (1) the least-poor group of schools, where fewer than 50 percent of the students were 

eligible, (2) 50 percent to 75 percent, (3) 75 percent to 90 percent, and (4) the poorest group, with more 
than 90 percent eligible. In schools where fewer than 50 percent of the students were eligible, average 
student performance was above that of the national norm group in both reading and mathematics. At the 
other end of the poverty scale, students in the highest poverty schools started out scoring lowest on the 
SAT-9 and continued to score the worst over time, maintaining an equally large gap from the national 
norm group between third and fifth grade. 

The combined effects of family poverty and attending a high-poverty school were still more 
serious, as Figures 2-4 and 2-5 display. In both reading and mathematics, we found that students in 
higher poverty schools had lower scores. At each level of school poverty, those students who were 
eligible for free or reduced-price lunch had lower scores. For example, in third-grade reading, the group 
of students with the lowest average score was the group of students eligible for free or reduced-price 
lunch and attended the highest poverty schools. The group that scored highest was the group of students 
not eligible for free or reduced-price lunch and attended the lowest poverty schools. Thus, there was a 
compounding negative effect of being poor and in a high-poverty school. 



7 The LESCP’s sole measure of poverty is eligibility for free or reduced-price lunch. This is a measure of poverty that is commonly used in 
Title I. For example, for a school to be able to implement its Title I program as a school wide program, the school must have at least 50 percent 
of its students eligible for free or reduced-price lunch. However, this measure covers a broad range of family incomes and poverty conditions. 
Other, more precise, measures of poverty may yield different findings. 

8 LESCP data also confirmed a very high correlation between race and poverty in the sampled schools. African American and Hispanic students 
were significantly more likely to be eligible for free or reduced-price lunch than white students. 
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Figure 2-2. Average reading SAT-9 scale score for all LESCP students, 

grouped by school poverty level 
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Figure 2-3. Average mathematics SAT-9 scale score for all LESCP students, 
grouped by school poverty level 
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Figure 2-4. Average reading SAT-9 scale for all LESCP students, grouped by poverty 
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Figure 2-5. Average mathematics SAT-9 scale for all LESCP students, grouped by poverty 
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These relationships between poverty and achievement illustrate the problem that Title I 
addresses. Studies like this one assess the extent to which policies in Title I, such as the program’s 
adoption of standards-based reform, are making the kind of changes in the classroom that might alleviate 
the achievement deficits wrought by poverty. The rest of this report furnishes our answers. But first we 
briefly examine the relationship between SAT-9 achievement scores (the primary outcome measures for 
LESCP) and state assessments (the primary outcome measures used by states and school districts). 



Relationship Between the SAT-9 and State Assessments 

The analyses conducted for this report, focusing on the student outcomes associated with 
particular aspects of classroom curriculum and instruction, emphasized trends in individual student 
performance across years. Student performance on the SAT-9 gave us performance data in a common 
metric across all the study’s classrooms, with individual test scores that could be associated with students’ 
individual demographic characteristics, and their own teachers’ survey responses concerning curricular 
and instructional attitudes and practices. We did not have a comparable level of detail regarding 
performance on state tests, and of course those tests varied across states. However, Title I charges states 
and school districts with improving performance in relation to state standards, as measured by state 
assessments. Therefore, it was worth checking how well the SAT-9 results match up with the results of 
state assessments. 

Comparisons were possible when states administered an assessment at the same grade level 
and in the same year as the LESCP study administered the SAT-9 (third grade in 1997, fourth grade in 
any of the 3 years, and fifth grade in 1999). We correlated each school’s performance on the SAT-9 with 
their performance on the state assessment by comparing how the school did on each measure in relation to 
the other LESCP schools in the state. 9 Table 2-8 shows the results of this analysis. We correlated 
performance on the SAT-9 with performance on a state assessment in all seven LESCP states. 

As shown in Table 2-8, there was variation in the degree to which a school’s performance on 
the SAT-9 was correlated with its performance on the state assessment. This held true both across the 
states in the LESCP sample and, to some extent, across grade levels. 



9 For each school, in both reading and mathematics, we computed the z-score, which placed the school within a distribution of other LESCP 
schools in their state for each SAT-9 and state assessment score. We then correlated the SAT-9 and state assessment z-scores with each other to 
determine which tests were correlated with each other. 
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Table 2-8. Significant correlation coefficients among school rankings within the LESCP sample on the 
SAT-9 and on state assessment scores 



Reading 



Mathematics 



State A 



Third grade (1997) 


.71 


.-1 


Fourth grade (1997) 




.79 


Fourth grade (1998) 




.83 


Fourth grade (1999) 




.72 



State B 



Third grade (1997) 


.80 


.32 


Fifth grade (1999) 


.78 


.79 



State C 



Fourth grade (1997) 


.18 


... ' “ i 


Fourth grade (1998) 


.49 


* ' | 



State D 



Fourth grade (1998) 


.63 


■ •• • . ... • j 


Fourth grade (1999) 


.84 


; | 


Fifth grade (1999) 




Sample too small for analysis 


State E 


Fifth grade (1999) 


.91 


.70 


State F 


Third grade (1997) 


.58 


.69 


Fifth grade (1999) 


.91 


.82 


State G 


Third grade (1997) 


.79 


-.32 


Fourth grade (1997) 


.96 


.72 


Fourth grade (1998) 


-.72 


-.49 


Fourth grade (1999) 


.68 


.39 


Fifth grade (1999) 


.73 


.48 



Table reads: In State A, the state assessment was administered to students at the third-grade level in reading in 
1997 and at the fourth-grade level in mathematics in 1997, 1998, and 1999. We found a positive 
correlation of .71 in the school’s ranking within the LESCP sample on its state assessment and SAT-9 
results for third-grade reading. This relationship was significant at the .05 level. 

Note: Bold type indicates that correlation coefficients were significant at the .05 level. Darkened cells indicate that the state assessment was 

not administered at the specified grade level in that year. 





In five states (all except States C and G) there was a good correlation between relative 
rankings by SAT-9 and relative rankings by the, state assessments. This gives us some confidence that 
SAT-9 measures achievement in much the same way as many of the state assessments. It suggests that the 
SAT-9 may provide a good substitute for state assessments in the analysis of student achievement, 
although it clearly does not map perfectly onto the content and skills measured by all states. This study’s 
use of a test that was uniform across all participating schools, which was necessary to conduct the planned 
analyses, thus seems to have been a reasonable choice. 



Conclusions 

The SAT-9 tests offered information about several aspects of student performance in reading 
and mathematics. They showed, for example, that the LESCP sample as a whole performed below 
national and urban norms on these tests. These test data also revealed a persistent, negative relationship 
between poverty — both individual and school-level — and student achievement. Although the SAT -9 
measured somewhat different skills than any particular state test, the standardized tests did offer a 
comparable basis for measuring performance and growth across all the study’s classrooms. Thus, they 
provided much of the data for the analyses described in Chapter 3, which focuses on those instructional 
conditions that might potentially alleviate the harmful effects of poverty on student performance. 
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CLASSROOM AND SCHOOL VARIABLES RELATED 
TO STUDENT PERFORMANCE 



We now examine the relationship between classroom and school practices (see Box 3 in 
Figure 1-1) and student achievement (see Box 4 in Figure 1-1). The chapter addresses the question: To 
what extent do particular instructional practices or instructional supports show statistically significant 
relationships with the Longitudinal Evaluation of School Change and Performance (LESCP) students’ 
outcomes in reading and mathematics? The analyses presented in this chapter relate to the achievement of 
the LESCP longitudinal students in the closed-ended reading and mathematics assessments. We begin by 
describing the variables used in our analyses, including the characteristics of instruction and instructional 
support measured by our teacher surveys. We then describe our procedures for tracing the relationship 
between these variables and students’ performance on tests across the 3 years of the study, with variables 
like student and school poverty used as controls. Because much of this chapter relies on hierarchical 
linear modeling (HLM) analyses, we outline the key assumptions and procedures associated with that 
analytic method. Next, we present our models and findings for reading and mathematics. 



Variables and Methods Used in the Analysis of Student Performance 

Our models assume that both proximal variables and control variables may be directly 
related to student performance. Proximal variables have to do with instruction and are potentially 
changeable by reform initiatives. They include measures of teacher beliefs, classroom practices in 
curriculum and instruction, and instructional supports like professional development. We built the 
proximal variables from the survey responses of individual teachers as well as average values for groups 
of teachers, either the whole-grade level or the whole school. The reason for considering school-level 
variables as proximal variables in our statistical models is that the average responses found between 
groups of teachers, such as all kindergarten to fifth-grade (K— 5) teachers in the school, tell us about the 
overall kind of academic press experienced by students in that school. We hypothesize that school 
environment as it relates to these proximal variables has an independent effect on student achievement 
above and beyond individual teacher practices. In short, whether found at the level of the classroom or the 
surrounding organization, the proximal variables are instructional conditions and supports that may most 
directly affect student learning. They are the targets of Title I policy and of standards-based reform, and, 
in Chapter 4, we will report on the extent to which reform policies are affecting them. 
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In the sections dealing with reading and mathematics, we present detailed descriptions of the 
proximal variables for each subject area, built from teachers’ survey responses. A quick overview, 
however, is that these variables pertained to specific parts of an overall vision of standards-based reform: 
a framework of content and performance standards, together with assessments and curriculum keyed to 
those standards, that would command attention and guide classroom practice; curriculum and instruction 
designed to engage students in relatively advanced academic tasks rather than bogging down in rote drill 
and practice; teachers prepared to teach in new ways, having participated in professional development 
geared to the standards and assessments; and active communication between school and home. 1 

Control variables are also characteristics of the student or school, but they are not 
specifically related to instruction and are not amenable to intervention by Title I policy or standards-based 
reform. They include student- and school-level measures of poverty and initial achievement, and school 
size. Our control variables include the following: 

■ Student Poverty - whether an individual student was eligible to receive free or 
reduced-price lunch in 1998; 

■ School Poverty - the percentage of longitudinal students in the school eligible to 
receive free or reduced-price lunch in 1998; 

■ Student’s Initial Achievement Status - whether an individual student’s third-grade 
score on the closed-ended reading or mathematics portion of the Stanford 
Achievement Test, Ninth Edition (SAT-9) fell in the bottom quarter of the national 
distribution of scores 2 ; 

■ School’s Initial Achievement - the percentage of longitudinal students in the school 
whose third-grade scores fell in the bottom quarter on this test; and 



1 The interrelated parts of a standards-based reform vision are described in J. O’Day and M.S. Smith. (1993). Systemic reform and educational 
opportunity, in S. Fuhrman (Ed.) Designing coherent education policy. San Francisco: Jossey-Bass, pp. 250-312, and in L.B. Resnick and K.J. 
Nolan. (1995). Standards for education, in D. Ravitch (Ed.), Debating the future of American education: Do we need national standards and 
assessments? Washington, D.C.; Brookings Institution. Other sources for the variables measured in this study included work on curriculum and 
instruction, such as A. Porter. (1998). A curriculum out of balance: The case of elementary school mathematics. East Lansing, Mich.; Institute 
for Research on Teaching, College of Education, Michigan State University; and M.S. Knapp and P.M. Shields, (Eds.). (1990). Better schooling 
for the children of poverty: Alternatives to conventional wisdom, Volume II: Commissioned papers and literature review. Washington, D.C.; 
U.S. Department of Education; and work on parent and family involvement such as S.L. Dauber and J.L. Epstein. (1993). Parent attitudes and 
practices of involvement in inner-city elementary and middle schools, in A.T. Henderson and N. Berla (Eds.). (1994). A new generation of 
evidence: The family is crucial to student achievement , Washington, D.C.; Center for Law and Education; and J. Ballen and 0. Moles. (1994). 
Strong families, strong schools: Building community partnerships for learning. Washington, D.C.; U.S. Department of Education. 

2 This status was used as an indicator of initial achievement instead of the actual third-grade score because the third-grade score is already 
included in the model as a repeated measure. 
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